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ABSTRACT 

The purpose of the present investigation is to 
examine the influence of sample size (N) and model complexity on a 
set of 23 goodness~of-fit (GOF) indices, including those, typically 
used in confirmatory factor analysis. The focus was on two potential 
problems in assessing GOF: (l) some fit indices are substantially 
influenced by N so that tests of the same model based on the same 
variables for .a new sample from the same population are not directly 
compai^able unless N is also held constant; and (2) the inclusion of 
additional parameters may provide an i .lusory improvement in fit. For 
data simulated from each of two different population models, values 
for 17 of the 23 fit indices were at least moderately influenced by 
N, and many of these indices failed to control sufficiently for the 
inclusion of superfluous parameters (i.e., parameters that had zero 
values in the population model) . Four of the indices were relatively 
independent of N and were not significantly affected by the inclusion 
of superfluous- parameters. The four recommended indices are two 
measures of fit based on the non-centrality parameter proposed by R. 
P. McDonald, the widely known incremental ( ilative) index developed 
by L. R. Tucker and C. Lewis (1973) , and a new incremental index — the 
McDonald-Marsh Index — that is based on one of McDonald's 
non-centrality indices. Descriptions of the 23 GOF indices used, 10 
graphs, and 5 data tables are provided. (Author/TJH) 
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6oodnn« of Fit in Confirmatory Factor Analysis: 
Th(} Effects of Sampls Size and Model Complexity 

ABSTRACT 

The purpose of the present investigation is to examine the influence of 
saepl* size (N) and model complexity on a set of 23 goodness-of-f it indices 
including those typically used in confirmatory factor analysis. For data 
simulated from each of two different population mcxlels, values for 17 of the 
23 fit indices Mere at least moderately influenced by N, and many of these 
indices failed to control sufficiently for the inclusion of superfluous 
parameter, (i.e., parameters that had zero values in the population model). 
Four of the indices were relatively independent of N and were not 
significantly affected by the inclusion of superfluous parameters. The 4 
recommended indices are two measures of fit based on the noncentrality 
parameter proposed by McDonald (in press), the widely known incremental 
(relative) index developed by Tucker and Lewis (1973), and a new 
incremet.^al index called the McDonald-Marsh Index (MMI) that is based on one 
of McDonald's noncentrality indicesD 
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Soodnns of Fit in Confirmatory Factor Anaiyiist 

Tha EffKts of Sa«plB Slzi and Model Complexity 

Th» purpose of the present investigation is to examine the influence of 
sample size (N) and of model complexity on different goodness-of-f it indices 
us«l in confirmatory factor analysis (CFA). In CFA responses to p observed 
variables by N subjects are summarized by a (p x p) sample covariance matrix 
and it is hypothesized that the corresponding population covariance matrix 
can be described by K parameters, namely the factor loadings, the factor 
variances and covariances, and the residual variances. To the extent that 
the fitted population covariance matrix r derived from a set of (in some 
sense) best-fitting parameters is similar to the observed sample covariance 
matrix S, the model is supported. The problem of goodness of fit is how to 
decide whether r is sufficiently similar to S to justify the conclusion that 
a specific model adequately fits a particular set of data. The present 
focus is how goodness of fit as assessed with a variety of indices varies 
with M, the number of cases in the data to be fit, and model complexity as 
measured by K. the numbsr of paraawters estimated in a series of nested 
models. 

The cPassical form of statistical hypothesis testing is generally 
inappropriate for evaluation of fit in CFA. Cudeck and Browne (1983) noted 
that since hypothesized models are best regarded as approximations to 
reality rather then exact statements of truth, any model can be rejected if 
the sample size is sufficiently large. From this perspective they argued 
that it is preferable to abandon the statistical hypothesis testing 
approach. Similarly, Joreskog and Sorbom argued that statistical hypothesis 
testing is generally inappropriate because "the statistical problem is not 
one of testing a given hypothesis (which a priori may be considered false) 
but rather one of fitting the model to data and to decide whether the fit is 
adequate or not" (p. 1.38-39). McDonald (1983, p. 56) also noted that 
hypothesis testing is inappropriate for selecting a restrictive modeE since 
"all common factor hypotheses are false, because all restrictive hypotheses 
are taise, and they will be proven false by the use of a sufficiently large 
sample size." In actual application only the "saturated" model can be true. 
Accordingly, a large number of fit indices have been proposed (e.g., Akaike, 
1974| Bollen, 198&| Bentler V Bonett, 19a0| Bozdogan, 1987| Cudeck V Browne, 
19831 Hoelter, 1983} Horn & McArdle, 1980| James, Mulaik & Brett, 1982| 
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Soodntss of Fit 2 
JarMkog It SorbM, 1981} Harsh, Ball* ti NcDonald, 1988| McArdU, 1986) 
HcDonald, in prMsi HcDonald It Marih, 1988| Schwartz, 1978| Staiger li Lind, 
1960| Tanaka, 1987| Tanaka li Huba, 1986| Tuckar \ Lmis, 1973) to facilitata 
thtt dvaluaticn of fit and tha compariion of altarnativa nodsls. 

D<Miirabl« Characteristics of Qjt Indices. 

The focus of the present investigation is on two potential probleAS in 
assessing goodness of fit. First, soaie fit indices are substantially 
influenced by N so that tests of the same Model based on the same variables 
for a nmt saeple fron the sane population are not directly coeparable unless 
N is alsT) held constant. Such an affect of H also aakes problematic any 
guidelines of what constitutes an acceptable fit. Thus, sone researchers 
have developed fit indices that are claiiMKi to be relatively independent of 
N. Second, the inclusion of additional parsMeters, particularly when based 
on a posteriori criteria and tested with the same data, may provide an 
illusory improveeerit in fit. Thus, some researchers have developed fit 
indices that ara claimed to compensate for capitalization on chance. From 
these perspectives, an ideal index of fit would be relatively independent of 
N, provide an accurata measure of goodness of fit for competing models, vary 
along a well-defined continuum that is easily interpreted, and control 
appropriately for model complexity. 

Many researchers have examined the effect of N on goodness of fit 
(e.g., Anderson & Gerbing, i984j Bearden, Sharma & Teel, 1982; Bentler 8t 
Bonett, 19801 -BoUen, 1986; Boomsma, 1982j Cudeck & Browne, 1983; Gerbing & 
Anderson, 1983; Hoeltery 1983; Joreskog tc Sorbom, 1981; Marsh, Balla & 
McDonald, 1988; Marsh & McDonald, 1988) and some have proposed fit indices 
that are claimed to be independent of N. Marsh, E ' In, and McDonald used 
actual and simulated data to demonstrate that nean/ all frequently used 
indices are substantially influenced by N. Of the more than 30 indices that 
they considered, the Tucker-Lewis index (TLI) was the only frequently used 
index that was relatively independent of N. ^ 

Researchers have also examined the effect of the number of parameters 
included in the hypothesized model on goodness of fit (e.g., Akaike, 1974; 
1981; Anderson l Gerbing, 1984; Bentler & Bonett, 1980; Boomsma, 1982; 
Bozdogan, 1987; Ikideck & Browne, 1983; Gerbing & Anderson, 1985; James, 
Mulaik Ic Brett, 1962; Joreskog & Sorbom, 1981; Schwartz, 1978; Tucker & 
Lewis, 1973). Many fit indices are monotonically related to model complexity 
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•s Mtamirvd by th* muibar of parsMtsrs Mtinatad in a s«riM of nMt*d 
Models «o that for MMpU data goodnMS of fit Hill continu* to improv* with 
th« addition of «or* paraMtars so long as ths df is positivs. From this 
psrsptctivs ths bsst fitting sodel will always bs ths saturated model with 
df»0. However, f<jr sample data this improved fjit due to the inclusion of 
additional paraeeters eay be due to capitalization on chance. Furtherraora 
the parameter estimates for a saturated model may be uninterpretable and 
researchers often seek more parsimonious models that are both theoretically 
defensible and able to describe their data adequately. 

Researchers have approached this problem of evaluating fit in relation 
to model complexity from different perspectives. For example, James, et al., 
(1982, p. 15J5) ask "how efficient is the increase in fit going from the null 
model with many degrees of freedom to another model with just a few degrees 
of freedom in terms of degrees of freedom Sost in estimating more 
Pcrameters?" Joreskog and Sorbom <i981, p. I. 40) note tiiat when the change 
in X2 is close to the difference in df due to the addition of new 
parameters, then the "improvement in fit is obtained by 'captializing on 
chance,' and that the added parameters may not have real significance and 
meaning." Cudeck and Browne <1983| also see Marsh, 1987) proposed the method 
of cross-validation to determine the ability of a set of parameter estimates 
to adequately describe data based on new observations from the same 
population and to determine the extent to which capitalization on chance has 
occurred. Cudeck and Browne also demonstrated the use of CAK and CSK <see 
definition in Appendix 1), indices described by Akaike <i974) and by 
Schwartz (1978) respectively that were rescaled in terms of FF (see Appendix 
1), for this purpose. Bozdogan (1987) noted that model selection requires 
researchers to achieve an appropriate balance between problems associated 
with overfitting and underfitting the data, and that different fit indices 
vary in the bailee of protection that they offer from these conflicting 
possibilities. Similarly, McDonald (in press) noted the need to strike a 
balance between badness of fit and model complexity or, equivalently, 
between goodness of fit and model parsimony. He furtner noted that this 
compromise is not an issue of sampling in that even if the true population 
were known, an appropriate compromise would still be required. 

Cudeck and Browne (1983) examined the joint influence of sample size 
and model complexity on goodness of fit. They considered the CAK and CSK 
indices that are a function of the number of estimated parameters. These 
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indicts art possibly « ustf ul indication of fit for comparing cottptting 
Mxlds that vary in thn nuabtr of paramtttrs ustd to dtscribt tht same data 
and havt rtctntly rtcaivtd much attention (e.g., Bozdogan, 1987). Cudeck and 
Brownt's rtsults, as Mtll as results by Marsh, Balla and McDonald (1988), 
shoM etpirically that thesft indices are subst/vttially influenced by N, and 
McDonald (in press) demonstrated that this relation Has inherent in the 
tathMatical fore of the indices. The Akaike index penalized the inclusion 
of additionrl paraeeters less severely than the Schwartz index so that it 
consistently led to the selection of more complex models (see Bozdogan, 
1987). This effect of sample size need not invalidate the use of these 
indices for purposes of model salection if the affects of N are relatively 
constant across the different siodels. That is, the same model may be 
selected as "best" for each of the different sample sizes even though the 
actual values of the fit indices varied according to sample size. However, 
Cudeck and Browne found that the relative fit of competing models did vary 
with N. For small sample sizes, simple models positing fewer parameters had 
better fit indices whereas for large sample sizas more complicated models 
positing more parameters, and, ultimately, for sufficiently large sample 
sizes, the saturated model, had better fit indices. As noted by McDonald 
(in press), two studies differing only in sample size would on average lead 
to the support of models differing in complexity and no investigator would 
reasonably use such indices if the sample size were large enough to require 
the selection of an uninterpretably complex model. 

Itje Present Investigation 

Our objective is to examine the effect of model complexity and of N on 
a net of 2Z goodness of fit indices. Data were generated from one of two 
known population models and a variety of models used to fit the data were 
developed in relation to these known population models. Some models posited 
parameters to be zero that were known to be non-zero for the population 
model, thus providing models that were under-fit. Other models estimated 
values for superfluous parameters b.. were known to be zero for the 
population model, thus providing models that were over-fit. Covariance 
matrices to be fit by the alternative models were based on one of six 
different sample sizes varying from SO to 1600. 

The set of 23 goodness of fit indices considered here are described in 
more detail in Appendix I. For present purposes the indices are classified 
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into thrM types, naiMilyt (a) itand-alons (abioluta) indices, (b) typs-l 
incrswsntal (rslativs) indicss, and (c) typ»-2 incremental (relative) 
indices. The 13 stand-alone indices are based on the results of jUst a 
target model, the a priori model posited by the researcher to fit the data. 
These indices are provided by, or easily computed from results provided by, 
LISREL and «ost other statistical packages used to fit structural equation 
models.- The incremental indices are basstl on the difference between the 
target model and an alternative model such as a "null" model in which r is a 
diagonal matrix (Bentler li Bonett, 1980). Incremental type-2 indices 
incorporate an expected value of an index for a true model whereas 
incremental type-1 indices do not (see Appendix). Karsh, Ball a and McDonald 
(1988) examined 19 of the 23 indices considered here — all but Dk, He, Z, 
and the McDonald-Marsh Index (MMI) — and found that only the TLI was 
relatively independent of N (also see footnote 1). McDonald (in press) 
indicated that his DK and Mc indices were relatively independent of N. Z, 
because it is monotonically related to x2, should be aff&cted by sample 
size. The MMI was developed for purposes of the present investigation. 2 

Method 

Ibft QEfii Model and Analvses 

All analyses were conducted with LISREL V (Joreskog & Sorbom, 1981) 
using the method of maximum likelihood. In each of the analyses involving 9 
observed variables a set of eight substantive models posited between IB and 
33 parameters io define 1, 2, or 3 factors. Hence the df (.5 x 9 x 10 - K) 
varied from 27 to 12. These eight models and their relation to the 
population model used to generate the df.ta are summarized in Table 1. A null 
model was also tested for each covariance matrix such that the reproduced 
covariance matrix was a diagonal matrix of variances and the nine measured 
variables were posited to be uncorrelated. The df for the null model (.5x9 
X 10 - 9 ■ 36) was constant for all the analyses. These nine models, the 
eight substantive models and the null model, were tested for each of 120 
covariance inatrices described below. 



Insert Tables 1 & 2 About Here 



Ib£ P»U' 

JM Samp^f iiSfjtt. The six sample sizes to be considered in the present 
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invMtigation, 30, 100, 200, 400, 800 and 1600, wtra Mlvcted to span th» 
rang* of «aMpl« sizM typically considarad in CFA. For aach of th» two data 
••t» to b* considsrad, ttn random samplM mrm gtnmratsd for «ach sampl* 
«i2« and thm saaw nin« nodsls wers fit to thess 120 (2 data seta x 6 sanple 
slzN K 10 caMs) covariance matrices. 

SilBlt ftrHCtViry sltmUtad dai:a (S3IH). The nine measured variables 
were defined with the random number generator from the commercially 
available SPSS package <Hull te Nie, 1981). Each variable was defined to 
reflect only one factor If^ztor loadings were .6, .7 or .8) and a normally 
distributed random error component, and the three factors were defined to be 
correlated (factor covariances were .08, .12, and .24). A total of 31,500 
cases were generated and divided into 60 sets of data such that each sample 
size was represented by 10 covariance matrices. The eight substantive models 
and the null model were fit to each of the 60 covariance matrices. 

The population model used to generate this data was one of the 
substantive models to be considered (3SF, see Table 1) and thus was the most 
tParsimonious model (i.e., contained the fewest estimated parameters) able to 
fit the data. Models positing only one or two factors (lUf and 2UF in Table 
1) should not be able to fit the data. In each of the remaining five 
substantive models, all the parameters in the 3SF model are included along 
with a varying number of additional parameters. These additional parameters 
are superfluous in that their population values, the values from the 
population models used to generate the data, are zero. The fit indices of 
these over-fit models are used to evaluate how various indices are affected 
by capitalization on chance. To the extent that any of these over-fit models 
fit the SSIM data significantly better than the 3SF model according to a 
particular index, tfjen the index does not control for the effects of 
capitialization on chance. To the extent that any of these models fit the 
data significantly poorer than the 3SF model according to any particular 
indices, then, perhaps, the index over-compensates for capitalization on 
chance. This relation between the substantive models to be tested and the 
SSIM data was the basis of a priori contrasts used to compare various models 
(see Table 2). 

QsSBlXi. Structure SjfuUtyj Sa£a (C3IH). The nine measured variables 
were defined as with the SSIM except that six of the nine measured variables 
~- two for each factor — were defined such that each should have a small 
loading (.2) on one factor in addition to the one it was designated to 
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rsfltct. (In T*bl« 1, th» 9 factor loadings corrMpoodi ng to tho5» in th« 
SSIH data ar« callad »ajor factor loadings Mhareas tht additional 6 factor 
loadings in th« modsl used to gansrats the CSIM data are called minor 
loadings). Again a total of 3t,500 cases were generated and divided into 60 
sets of data such that each sample size was represented by 10 covariance 
Matrices, and the null and hypothesized models were fit to these 60 
covariance matrices. 

The population model used to generate the CSIM data was one of the 
substantive models to be considered (3CF, see Table 1) and so it is the most 
parsimonious model able to fit the CSIM data. Model 3UF, positing three 
unrestricted factors, should also be able to fit the data adequately though 
it is less parsimonious. Models positing only one or two factors (lUf and 
2UF in Table 1) should not be able to fit the data. Furthermore, in each of 
the remaining four substantive models positing three factors, either 3 
(Models 3F1 and 3F2) or all 6 (Models 3F3 and 3SF) of the minor factor 
loadings are constrained to be zero. Of these four Models, only Model 3F3 
contains superfluous parameters, parameters whose population value is zero. 
This set of Models provides additional tests of how the different indices 
vary according to Model complexity and models known to over-fit or under-fit 
the data in relation to the known population parameters. Two sets of models 
(Models 3UF and 3CF, and Models 3F3 and 3SF) should be equivalent in their 
ability to fit the data but differ in the number of parameters that are 
estimated. For two additional sets of models (3UF vs. 3SF| 3F1 and 3F2 vs. 
3SF) the model that should fit best requires more parameters so that an 
index that over-corrects for capitalization on chance may distort 
appropriate differences in fit. This relation between the substantive models 
to be tested and the CSIM data was the basis of a priori contrasts used to 
coMpare various models (see Table 2). 

Results 

The analyses to be described are based on a set of 8 (substantive 
models) x 6 (sample sizes) AhKJVAs which were followed up by the set of 9 a 
priori contrasts described in Table 2. Separate analyses were conducted for 
each of the 23 fit indices and separate analyses were conducted for results 
of the SSIM and CSIM data. 

StMDle Simulated (SSIM) Data. 
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HOfilllt. In rsUticxi to th* populaticm aodel ustd to generate the SSIH 
data, Hodels 3-8 should bt able to fit the data (i.e., all nonzero 
population parameters are estimated) whereas Hodels 1 and 2 should not. For 
all 23 fit indices there are significant differences In the ability of 
coepeting models to fit the data (see Etas attrioutable to the Model In 
Table 3), and most of this difference is due to the poorer fits of Models 1 
and 2. 

For the SSIM data, models 3 - 8 are all able to fit the data but differ 
In the nueber of paraoeters that are posited. Because all these models 
should be able to fit the data. It could be argued that the models shculd 
not differ In goodness of fit. For analysis conducted on just Hodels 3-8 
(Table 4) the effect of the model complexity varies substantially with the 
fit indexi 7 Indices show significantly better fits when more (superfluous) 
parameters are estimated, 6 Indices show significantly poorer fits when more 
parameters are estimated, and the remaining 10 Indices are not significantly 
related to the number of estimated parameters. 



Insert Tables 3 & 5 and Figure 1 About Here 



For all but 3 Indices (DK, MC, and LHRIl) the effect of the models 
Interacted significantly with sample size (see Table 3), though the size of 
this Interaction was substantial for only 6 indices. Particularly for these 
b indices there Is a similar pattern of Interaction. For Hodels 1 and 2 that 
are unable to -fit the data, fit becomes substantially poorer as sa:»ple size 
Increases (see x2 in figure 1). For Hodels 3-8 that are able to fit the 
data, differences between models less related to sample size. Thus, for 
analyses of just Hodels 3-8 (Table 4) the size of this Interaction Is much 
smaller. The form of the interaction is Illustrated for other selected 
Indices In Figure 1. 

SiJioU Size M. Effect. The effect of the six levels of N Is 
statistically significant and substantial for 17 of the 23 Indices ' ^as of 
•^.26 to .96| see Table 3). For thKse 17 Indices most of this effect can be 
explained by the linear effect of log N (rs of .25 to -'.90). The direction 
of the effect of N, however, depends on the particular Index (see Table 3 
and Figure 1). The relation between goodness uf fit and N Is not 
statistically significant for HcDonald's Dk and He stand-alone indices and 
the HHI relative Index, and Is very small for the TLI relative index. 
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A Priori Contrfl^tt Th« purpo*** of th« a priori contrasts ars to test 
th» ability of ths 23 indicts to difftrtntiats anong Modsls known to differ 
in th»ir ability to fit ths data, and to evaluate the indices in relation to 
capitalization on chance. For the S3IH data, the set of 9 a priori contrasts 
can be divided into two types. Contrasts 1 and 2 compare models that are 
known to differ substantially in their ability to fit the data, Hhereas 
contrasts 3-9 coepare models that are all able to fit the data. For 
contrasts i and 2, comparisons based on 20 of the 23 indices are 
statistically significant and in the right direction. For CN both contrasts 
are in the right direction but one is not statisticaUy significant. For the 
two parsimony indices one or both of the contrasts are significant but in 
the wrong direction. These results based on contrasts 1 and 2 provide 
support for 20 of the indices, but call into question the usefulness of CN 
and the two parsimony indices. 

Contrasts 3-9 are all based on comparisons among Models 3-8 that do 
not differ in thair ability to fit the data. Because the S3IM data was 
generated by a population model containing only 21 parameters estimated in 
Hodel 3SF (Table i), additional parameters are superfluous. Fiir jUst 
contrast 5 the models being compared are equally able to fit the data and 
posit the same nuRber of psrameters (each contains 3 superfluous 
parameters) I this contrast fails to reach statistical significance for any 
of the 23 indices. 

Contrasts^3, 4, 6, 7, 8 and 9 all compare fnodels that are able to fit 
the data but differ in the number of (superfluous) parameters. For each of 
these contrasts (Table 3), a plus (+) indicates that the model with more 
parameters fits the data better whereas a minus (-) indicates the opposite. 
The behavior of the different fit indices in relation to these contrasts 
vary substantially and fall into three classifications. 

?) For 8 indices (FF, LHR, x2, RMR, 6FI, FFIl, LHRIl, x2ll) all 
statistically significant contrasts favor the models that posit more 
parameters. For these 8 indices, even those contrasts that are not 
statistical ly significant favor models that posit more parameters. For these 
indices more complex models positing more parameters fit the data better. 
Because the true population values for these additional parameters are known 
to be zero for this simulated data, this improved fit is illusory and du& to 
capitalization on chance. 
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2) For 6 of th» fit indiCM (CSK, CAK, OCSK, OCAK, Pix2, and PIRMR), 
all statistically significant contrasts favor modsls that posit fswer 
paraMttrs. For thsss 6 indicss, even those contrasts that are not 
statistically significant favor models with fewer parameters. That is, 
models positing more (superfluous) parameters fit the data more poorly than 
models positing fewer parameters so that these indices can be said to 
penalize model complexity. The danger in penalizing mcdel complexity too 
severely is observed for the two parsimony indices in relation to contrasts 
1 and 2. For both these contrasts, the better model (in relation to the 
known population model) posited more parameters. T^e two parsimony indices 
so severely penalize the inclusion of additional parameters that better 
fitting models have significantly poorer indices of fit. Examination of the 
contrasts for the remaining four indices in this second group suggests that 
the CSK and OCSK penalize model complexity more severely than CAK and OCAK 
(also see Bozdogan, 1987, for a mathematical basis for this observation). 
However, because contrasts 1 and 2 are statistically significant and in the 
right direction for each of these four indices, there is no basis for 
claiming that model complexity is penalized too severely. Indeed, )t may be 
reasonable to severely penalize the inclusion of superfluous parameters so 
long as models better able to fit known population parameters have better 
indices than models less able to fit known population parameters. Although a 
useful guideline for simulated data, this condition cannot be tested for 
real data since the population parameters can never be known. 

3) For the remaining 9 fit indices (x2/df , A5FI, CN, DK, MC, Z, 
x2/dfll, TLX, and MMI), none of the contrasts are statistically significant. 
That is, for these indices models positing more (superfluous) parameters do 
not differ significantly from models positing fewer parameters. 

S^^ry gf S3IM analyses. Analyses of the SSIH data were used to 
examine the behavior of 23 indices of fit. Four of the indices (DK, Mc, 
TLI, and MMI) were relatively independent of N and were not significantly 
affected by the inclusion of superfluous parameters. The remaining 19 
indices were at least moderately influenced by N and many were significantly 
affected by the inclusion of superfluous parameters. CN, in addition to 
being substantially influenced by sample size, did not differentiate between 
tcodels known to differ in their ability to fit the data. The two parsimony 
indices, in addition to being moderately influenced by N, were shown to 
penalize model complexity too severely. 
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CsifilKL Si«ulat«c| (C3IM) D ata. 

BsdglSi For the CSIM data the effect oi the different inodels is 
statistically significant and substantial for all 23 fit indices (Table 5) 
This effect of models interacts significantly with N for 17 of the fit 
indices, though the size of the interaction is substantial for only 6 
indices. The indices most affected by this interaction and the nature of 
this interaction are similar to that observed for the SSIH data (also see 
Figure 1), and so are not discussed further. 



Insert Table 5 
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§iimle Size (m. Effect,. The effect of N is statistically significant and 
substantial for 19 of the 23 indir- (etas of .26 to .96} see Table 5). For 
these 19 indices most of this effect is linearly related to log N (rs of 
-'.25 to -'.90), but the direction of this effect depends on the index (see 
Table 5 & Figure 1). The relation between goodness of fit and N is not 
statistically significant for Dk, Mc, TLI and MMI. Again, these results are 
similar to those observed for the SSIM data. 

A Priori Contrasts., For the CSIM data, the set of 9 a priori contrasts 
can be divided into two types. Contrasts 1, 2, 4, 5, 6, 7, and 9 are between 
models known to differ in their ability to fit the data, whereas contrasts 3 
and 8 compare models that are equally able to fit the data but differ in the 
number of superfluous parameters that are posited. 

Contrasts 1 and 2 are gross tests in that they compare the 3 
unrestricted models positing 1, 2 and 3 factors. For 20 of the 23 indices, 
contrasts 1 and 2 are statistically significant and in the right direction. 
For CN both contrasts are in the right direction but one is not 
statistically significant. For the two parsimony indices one or both of the 
contrasts is significant but in the wrong direction. These results based on 
contrasts 1 and 2 are similar to findings based on the SSIM data and so are 
not discussed further. 

Contrasts 4 and 9 are also rather gross tests in that models that are 
able to fit the data (3UF and 3CF) are compared to model 3SF in which all 6 
minor factor loadings known to be nonzero in the population model are fixed 
to be zero. For 17 of the 23 indices, these contrasts are statistically 

11 
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significant and in the right direction. For CSK, OCSK, OCAK and the two 
parsimony indices, one or both of these comparisons is statistically 
significant and in the wrong direction. This demonstrates that .with respect 
to these contrasts, these indices penalize model complexity too se/erely. 

Contrasts 6 and 7 are less gross in that the models being compared 
differ in terms of only 3 of the 6 minor factor loadings. For 16 of the 23 
indices, these contrasts are statistically significant and in the right 
direction. For the two parsimony indices, both these contrasts are 
statistically significant but in the wrong direction. For CSK the contrasts 
are in the wrong direction, but not statistically significant. For CAK, CN, 
OCSK, and LHRIl, one of these contrasts was not statistically significant 
though none were in the wrong direction. This demonstrates that with respect 
to these contrasts, the at least the parsimony indices penalize model 
complexity too severely. 

Contrast 5 competes models 3F1 and 3F2 in which 3 of the 6 minor factor 
loadings are fixed to be zero with model 3F3 in which all 6 are fixed to be 
zero. Thus, Models 3F1 and 3F2 should be able to fit the data better than 
model 3F3. In model 3F3, however, 3 additional superfluous parameters are 
also estimated so the df is the same for all three models. For only 5 (X^, 
x2/df, OCAK, OCSK, and Z) of the 23 indices is this contrast statistically 
significant and in the right direction. For all 23 indices, however, this 
contrast was in the right direction and the contrast approached statistical 
significance for many of these indices. It is also relevant to note that the 
results of this contrast are not related to the nufi^er of estimated 
parameters in that all the models posited the same number of parameters. 

Contrasts 3 and 8 compare models that are equally able to fit the data 
but differ in the number of (superfluous) parameters. As observed with the 
SSIH data in this situation, the behavior of the indices fell into three 
general categories. For 10 indices one or both of these contrasts are 
statistically significant such that the model positing more parameters fits 
the data better. As noted previously, this improved fit is illusory and 
represents capitalization on chance. For 4 indices one or both of these 
contrasts are statistically significant such that the model positing fewer 
parameters fits the data more poorly. For 9 indices, neither of these 
contrasts is statistically significant. 

Summary CSIM analyses. Analyses of the CSIM data were used to 
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exaiDine the behavior of 23 indices of fit. Four of the indices (DK, Mc, 
TLI, and MMI) were relatively independent of N and wer6 not significantly 
affected by the inclusion of superfluous parameters. The remaining 19 
indices were at least moderately influenced by N and many were shown to 
significantly capitalize on chance when superfluous parameters were 
estimated. CSK, OCSK, the two parsimony indices, and perhaps CAK in addition 
to being moderately influenced by N, were shown to penalize model complexity 
too severely in that models less able to fit the data provided better fits 
than mcdels better able to fit the data. These findings are generally 
consistent with those based on the SSIM data. 

Disc'jtssion 

Results for both the SSIM and CSIM data lead to clear conclusions about 
the behavior of fit indices considered here. For 19 of the indices ~ all 
but Dk, Mc, TLI, and MMI — there was a moderate or large effect of N. 
These results are consistent with conclusions by Marsh, Balla and McDonald 
(19B8), Marsh and McDonald (1988), and McDonald (in press). These same 4 
indices were also shown to be not significantly affected by the inclusion of 
superfluous parameters that had population values known to be zero, lu 
contrast, the addition of superfluous parameters resulted in significant 
improvements in fit that was due to capitalizing on chance for many of the 
indices. Other indices were shown to penalize model complexity too severely 
in that inclusion of parameters that had nonzero values in the population 
led to a significantly poorer fit. In some instances indices penalized model 
complexity so severely that models better able to fit the data in relation 
to the known population parameters produced poorer fit indices than models 
that were less able to fit the data but contained fewer parameters. Whereas 
a few other indices were not significantly affected by the introduction of 
superfluous parameters, all of these other indices ware at least moderately 
affected by sample size. Hence, in relation to the desirable characteristics 
of fit indices considered here, there is clear support for only Dk, Mc, TLI, 
and MMI indices. 

The empirical results presented here suggest little basis for choosing 
among the four recommended indices. In fact correlations among these indices 
are .97 or higher for the data considered in this study. Theoretically, 
however, the four indices differ in important ways. McDonald's two indices 
are absolute or stand-alone indices that depend only on the model being 
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tested. Mc may be preferable to Dk in that it varies on a zero-to-one 
continuum that may prove to be more easily interpreted. McDonald noted, 
however, that such interpretations must be subjective since only the 
saturated model is true in application. TLI and MMI are both incremental or 
relative indices that depend on the fit of a null model as well as the fit 
of the hypothesized model. The TLI is much better known than the new MMI, 
but its estimation is frequently unstable particularly when sample size is 
small (see Figure 1| also see Anderson & Berbing, 1984} Marsh, Balla & 
McDonald, 1988). Further research may show, however, that the same problem 
applies to the MMI although it was not apparent in the present 
investigation. The Dk, Mc, and MMI also differ from the TLI in that the 
first three are monotonically related to the number of estimated parameters 
whereas McDonald and Marsh (1988) show that the TLI can be written as an 
index of fit that is weighted by a parsimony index. In this respect, the TLI 
can be said to penalize model complexity whereas the other indices do not. 
In the present investigation this mathematical distinction between these 
indices was not demonstrated empirically. This can apparently be explained 
by the observation that when the TLI is sufficiently large, as in most of 
the contrasts in the present investigation, the size of this penalty is 
negligible. Hence, it is possible the these four indices will differ mere 
substantially in other situations and this is ai^ i^tiportant question for 
further research. 

The present investigation is based on a variety of models fit to 
simulated data^of varying sample sizes derived from only two different 
population models. Hence, there is concern about the generality of our 
findings, particularly with respect to use of simulated data. We found that 
19 of the indices considered here were at least moderately affected by 
sample size, and that many of these were significantly influenced by the 
addition of superfluous parameters that represented capitalization on 
chance. Other indices were shown to penalize model complexity too severely 
in that models able to fit the data resulted in significantly poorer indices 
than models not able to fit the data. These findings call into question the 
usefulness of these indices as indicators of fit according to the criteria 
proposed here. Even if other research shows any of these indices to be 
useful in some specific situations, our results would stand as 
counter instances to the generality of claims to their usefulness. 

We found that 4 of the indices considered here were relatively 
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independent of sample size and were not significantly affected by the 
inclusion of additional parameters. The conclusions about the effect of N on 
thase indices is consistent with other empirical research and mathematical 
derivations of the indices. Further tests of the generality of our findings 
based on the data sets considered here, however, will help clarify the 
relations between these indices and model complexity. With further research 
it may be possible to establish useful guidelines on the values of these 
indices that constitute acceptable fit, but such attempts may be unjustified 
for any of the other 19 indices considered here. For real data, however, 
none of the population parameters will generally have a zero value so that 
there may be no rational basis for concluding that any restricted model fits 
the data better than the saturated model. Ultimately model selection must be 
based on evaluation of fit, the behavior of competing models, and 
substantive issues. From this perspective it would be undesirable to 
establish absolute guidelines about what constitutes an adequate fit that 
are independent of the research context. 
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Footnotes 

1 — Several additional incremental (relative) indices referred to Typ8-2 
incremental indices (see Appendix for discussion of Type-1 and TypB-2 
incremental indices) by Marsh, Balla and McDonald (1988) were found to be 
relatively independent of sample size for the four data sets considered in 
that study. McDonald and Marsh (1988) subsequently showed, however, that by 
their mathematical form some of these indices should vary with sait^le size 
under certain conditions that did not exist in the data sets considered by 
Marsh, Balla and McDonald (1988). 

2 — McDonald first developed his two indices, Mc and Dk, based on the 
noncentrality parameter in late 1986, as described by McDonald (in press). 
Shortly after their development, in February of 1987, Marsh and McDonald 
proposed the incremental type-1 and type-2 forms of both these indices for 
purposes of the present investigation. Only the results of the DkI2 arc 
actually presented here. Dkll and W<12 are mathematically identical (see 
Appendix) whereas the pattern of empirical results based on the MCI2 were 
nearly identical to those based on DKI2. MCIl, because it was significantly 
related to sample size, was not pursured for purposes of the present 
investigation. Subsequently, in October 1988, McDonald and Marsh evaluated 
the mathematical properties of DKI2 more fully in research described in 
Marsh and McDonald (1988). For purposes of that paper and the present 
investigation, the index is referred to as the McDonald and Marsh index 
(MMI). 



19 



Goodness of Fit 17 

REFERENCES 

Akaike, H. (1974). A new look at the statistical model identification. IEEE 

Transactions on Automatic Control, 19, 716-723. 
Akaike, H. (1981). Likelihood of a model and information criteria. Journal 

of EconometricSt 16, 3-14. 
Anderson, J. C. , 8e Gsrbing, D. W. (1984). The effect of sampling error on 

convergence, improper solutions, and goodness-of H^it indices for maximum 

likelihood confirmatory factor analysis* Psychometrika, 4?., 155-173. 
Bearden, W. 0., Sharma, S., & Teel, J. R. (1982). Sample size effects on 

chi-square and other statistics used in evaluating causal models. Journal 

of Marketing Research, 19. 425-530. 
Bentler, P. M. & Bonett, D. 6. (1980). Significance tests and goodness 

of fit in the analysis of covariance structures. Psychological 

Bulletin, 88, 588-606. 
Bishop, Y. M. M. , Fienberg, S. E., Je Holland, P. W. (1975). Discrete 

multivariate analysis; Theory and practice. Cambridge, Mass: MIT Press. 
Bollen, K. A. (1986). Sample size and Bentler and Bonett's nonnorraed fit 

index. Psychometrika< 51, 375-377. 
Boomsma, A. (1982), The robustness of LISREL against small sample sizes in 

factor analysis models. In K. 6. Joreskog and H. Wold (Eds. ), Systems 

undfe.' indirect observation; Causality, structure, prediction (Part I) . 

Amsterdam: North-Holland. 
Bozdogan, H. (1987). Model selection and Akaike's information criterion 

(AIC): The general theory anc* its analytical extensions. Psvchometrika. 

345-370. 

Cudeck, R. , & Browne, M. W. (1983). Cross-validation of covariance 
structures. Multivariate Behavioral Research, 18^ 147-167. 

Gerbing, D. W. , & Anderson, J. C. (1985). Effects of sampling error and model 
characteristics on parameter estimation for maximum likelihood confirmatory 
factor analysis. Multivariate Behavioral Research, 20^ 255-271. 

Hoelter, J. W. (1983) The analysis of covariance structures: Goodness- 
of-f it indices. Sociological Methods !t R esearch, 11^ 325-344. 

Horn, J. L., !( McArdle, J. J. (1980). Perspectives on 

mathematical /statistical model building (MASMOB) in research on aging. In 
Poon, L. W. (ed.). Aging in the 1980^ s; Selected contemporary issues in the 
psychology of aging, (pp. 503-541) American Psychological Association, 
Washington, DC. 

Hull, C. H., & Nie, N. H. (1981). SPSS update 7-9. New York: McBraw- 



ERLC 



17 

20 



Goodness of Fit 18 

Hill. 

Jafltes, L. R,, Mulaik, S, A., & Brett, J. M, (1982). Causal analysis. 

Assumptions, models, and data. Beverly Hills, CA: Sage. 
Joreskog, K. 6. & Sorbom, D. (1981). LI5REL Analysis of Li near 

Structural Relations B^^ the Method of Maxinum Likelihood. Chicago: 

International Educational Services. 
Marsh, H. W. (1987). The factorial invariance of responses by males and 

females to a multidimensional self-concept instrument: Substantive and 

methodological issues. Multivariate Behavioral Research t 22. 457-480. 
Marsh, H. W., Balla, J. R, & McDonald, R. P. {1988). lioodness-of --f it indices 

in confirmatory factor analysis: The effect of sample size. Psychological 

Bulletin. 102, 391-410.. 

McDonald, R. P. (1985). Factor analysis and related methods^ Hillsdale, NJ: 
Erlbaum. 

McDonald, R. P, & Marsh, H. W. (1988). Choosing a multivariate model: 

Noncentrality and goodness-of-f it. (In Review), 
McDonald, R, P. (in press) An Index of Goodness-of--f it based on 

noncentrality. Journal of Classification. . 
McArdle, J. J. Latent variable growth within behavior genetic models. 

Behavi or Genetics, 16, 163-200. 
Schwartz, G. (1978). Estimating the dimension of a fflodel. Annals of 

Statistics, 6^ 461-464. 
Steiger, J. H., & Lind, J. M. (May, 1980). Statistically-based 

tests for the number of common factors. Paper presented at the 

Pf>ychometr:ka Society Meeting, Iowa City. 
Tanaka, J. S. (1987). "How big is big enough?": Sample size and goodness of 

fit in structural equation models with latent variables. Child 

Development, 58^ 134-146. 
Tanaka, J. S. , & Huba, G. J. (1985). A fit index for covariance structure 

models under arbitrary GLS estimation. British Journal of Mathematical and 

Statistical Psychology, 46, 621-635, 
Tucker, L. R, & Lewis, C. (?973). The reliability coefficient for maximufli 

likelihood factor analysis. Psychometrika, 38, 1-iO. 
Wald, A. (1943). Tests of statistical hypotheses concerning sevei al 

parameters when the number of observations is large. Transactions of the 

American Mathematical Society, 54, 426-482. 



ERLC 



18 

21 



Goodness of F?t 19 



APPENDIX I 

D escriptions g£ th^ 23 Goodness e£ Fit. Ind i ces Used in This Study 

Four types of -fit indices are considered here. Stand alone indices are based 
on results ci just the hypothesized model. Two forms of incremental indices, 
called type-1 and type-2 for present purposes, are based on differences in 
fit between a hypothesized model and a null model. Parsimony indices are an 
alternative form of the type-l incremental indices thf Impose a penalty 
function for the inclusion of additional parameters. 

I. Absolute, Stand-alone Indices* 

Ihe maximum likelihood f itting function (FF) and the scaled likelihood ratio 
^LHR)^ Although not typically presented as fit indices (but see Cudeck & 
Browne, 1983), the FF and LHR are the basis for the test statistic and 
most other fit indices. Tne FF has a minirpum value of 0 when E = S, but does 
not have an upper bound. The scaled LHR has a maximum value of 1.0 when E = S 
and a minimum value of zero. The FF and LHR are defined as: 

(1) FF = X2 /(N), 

(2) LHR = Exp(X2/(-2 x (N))) = e -1/2 FF, 

and x2/df Ratio. These two indices continue to be the most frequently used 
indices. The X^ for a false model varies directly with sample size, but the 
x2 for a true model does not. In CFA the df does not vary with the sample 
size, so that the effect of sample size on the X^/df must necessarily be the 
same as for the X^. For alternative models of the same data, increasing the 
number of parameters necessarily results in a better (i.e., lower) X^. 
Because the X^/df ratio incorporates a penalty function for using more 
parameters, it may be poorer if additional parameters result in little 
improvement in X^. They are defined as: 

(3) X2 = tr (E S - I) - log J E -1 S 5 = (N) FF, 

(4) x2/df =((N)/df) FF. 

LISREL's root mean square residual (RMR) . Joreskog and Sorbom (1981, p. 1.41) 
define the RMR as the square root of the mean of squared residuals in S and 
E. When S and E are based on correlation matrices RMR is strictly bouncfed by 
0 and 1. For covariance matrices RMR still has a lower-bound of zero but does 
not have an upper bound. Thus RMR must be interpreted in relation to the size 
of the variances and covariances of the measured variables, and cannot be 
compared across applications based on different variables. RMR is defined as: 
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(5) RMR = C 2 E E (Sij - eij)2 /(p x <p+l))] 1/2. 
where s>ij and ejj are elements in S and E 

LISREL's GCudness-of-f it (GFI) and adjusted GFI (AGFI). Joreskog and Sorbom 
(1981 J also see Tanaka^ & Huba, 1936) describe the GFI and AGFI as computed by 
LISREL, They state that GFI is "a measure of the relative amount of variances 
and covariances jointly accounted for by the fnodel" and assert that "unlike 
GFI is independent of the sample size" while AGFI "corresponds to using 
mean squares instead of total sums of squares" (Joreskog !c Sorbom, 1981, p. 
I. 40-41). Thus AGFI incorporates a penalty function for additional 
parameters. Joreskog and Sorbom suggest that GFI and AGFI will generally fall 
between 0 and 1, but that it is possible for them to be negative. They are 
defined as: 

(6) GFI = 1 - C (tr (E"l x S - I)2/(tr E ~1 S)2 ], 

(7) AGFI = 1 - Cp X (p+l)/2df] x (1 - GFI). 

Information Criterion. Akaike (1974, 1981) and Schwartz (1978) each proposed 
fit indices that incorporate penalty functions based on the number of 
parameters that are estimated. Cudeck and Browne (1983, p. 154) proposed 
rescaled versions of these indices expressed in terms of FF. For purposes of 
the present investigation, Cudeck and Browne's rescaling of the CAK (based on 
Akaike, 1974) and CSK (based on Schwartz, 1978) i>.re defined as: 

(8) CAK ^ FF + 2K / N, 

(9) CSK = FF + (K x ln(N)) / N 

where K = the number of parameters to be estimated. 

The corresponding indices originally proposed by Akaike and by Schwartz are 
defined as: 

(10) OCAK = X2 + 2K, 

(11) OCSK = X2 + K ln(N). 

C ritical N (CM). Hoelter (1983, p. 528) argued that ••rather than ignoring or 
completely neutralizing sample size we can estimate the size that a sample 
must reach in order to accept the fit of a given model on a statistical 
basifi. This estimate, referred to here as 'critical N' (CN), allowii one to 
assess the fit of a model relative to identical hypothetical models estimated 
with different sample sizes." Hoelter cautioned that no firm basis could be 
offered as to what constituted an adequate fit, but ha suggested that a value 
of 200 was a reasonable starting point for suggesting that differences 
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between the model and data may be unimportant. In practice the usefulness of 
CN would rest on the assumption that its value is independent of sample size. 
It is defines as: 

(12) CN = CCzcrit + (2 x df - l)l/2]2/c2 x X2/(N)]] + 1. 

where *^Qrit ~ *^he critical value from a normal curve table for a given 
probability level — 1.96 in the present investigation. 

McDonald's Fit. Indices, HcDonald (in press) notes that a problem with the 
CAK, as with many other fit indices, is that the value of the index and model 
selectino based on it are dependent on sample size. His DK index is based on 
similar formulations as the CAK but with a slightly different derivation. 
McDonald proposed Wald's (1943) noncentrali ty parameter (also see related 
suggestions by Steiger, 1980), rescaled to be independent of sample size, as 
an index of fit, estimated by: 

(13) DK = FF - df /N = CAK - (2K/N) - df /N. 

McDonald further proposed that DK could be transformed to yield Mc, a measure 
of centrality that is a consistent estimator of the asymptotic likelihood 
ratio scaled to be independent of sample size. Mc is scaled to lie on the 
interval zero to unity with unity representing a perfect fit, though sampling 
error may produce values greater than 31.0. It is defined as: 

(14) Mc = exp (-.5 DK) 

Normal Devi ate Z-score. Horn and McArdle (1980) proposed the Wi 1 son-Hi If erty 
normal deviate-'Z-score (also see Bishop, Fienberg & Holland, 1975, p. 527) as 
a useful indicator of fit. It is defined as: 

(15) Z = C (X2/df) 1/3 ^ c 1 ^ (2/9 df)]] / [(2/9 df) 1/2] 

Because this quantity is a monotonia function of it apparently will be 
influenced by N so long as the hypothesized model is false. 
II. Relative, Tvpe-1 Incremental Fit Indices. 

Bentler and Bonett (1980) proposed that valuable information could be 
obtained by comparing the ability of nested models to fit the same data- In 
the case of CFA it may be useful to compare the fit of the proposed target 
model with the fit of a null model in which all the p variables are assumed 
to be uncorrected. (It should be noted that in general models for the 
analysis of covariance structures the null model is not the only more 
restrictive model that could be considered as a baseline model.) If the fit 
of a null model is reasonable, because the sample size is small or because 
the measured variables are relatively uncorrelated, then the difference in 
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fit between the null and target models will be small. However, if the fit of 
the null model is reasonable then there is little covariance to explain and 
no basis of support for the target model even if it also fits the data. 
Bentler and Bonett specifically stated that these indices are useful for 
comparing the fit of a particular model across samples that have unequal 
sizes. They cautioned that the absolute value of these indices may be 
difficult to interpret, but that values of less than -^.9 usually mean that 
the model can be improved substantially. Much of the value of these indices 
is based on tho assumption that their behavior is independent of sample size. 

One form of the incremental index, called type-l incremental indices for 
present purposes, can be used to derive incremental fit indices from each of 
the stand alone indices described earlier: Absolute Value (t - n) / Maximum 
of (t or n), where t is the value of a stand-alone index for the target 
model, and n is the valub for the null model. For present purposes, 
incremental type-l indices were defined in relation to the FF, LHR, X^, and 
X^/df, and are denoted by appending an II to each stand-alone index. The X2l2 
is more commonly known as the Bentler-Bonett Index (BBI) and Bollen (1986) 
described an index related to the FFIl (see Marsh, Balla & McDonald, 1988). 
These are defined as: 

(16) FFIl = (FFn - FF^)/ (FFfj) . 

(17) LHRIl = (LHRj. - LHRp)/ (LHRt). 

(18) X2ll^= BBI = (Xn^ - Xt^)/ (Xp^). 

(19) x2/df II = (XpZ/dfj^ - Xt^/dft)/ (Xp^/df^). 
III. Parsimony Indices. 

James et al. (1982) also described an alternative form of the incremental 
type-l indices called the parsimony index (PI). The PI invokes a penalty 
function for using additional parameters by multiplying an incremental type-l 
index by the ratio of the dfs for the null and target models: PI = (dfj/df^) 
X Incremental Type-l Index. Using this general formulation, James et al. 
recommended a PI based on the X^ defined as: 

(20) PIX2 = (dfj/df^) X (Xn2 - Xt^)/ (X^Z). 

Similarly, McArdle (1986) described a parsimony index based on the RMR. 
(?1) PIRMR = (dfj/dfp) K (1 - CRMRt / RMRnID, 

Additional parsimony indices could be derived for other stand-alone indices, 
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thcxigh this night not make sense for indices that already impose a penalty 
function (e.g., the A5FI and the X^/df). 

IV. Relative, Incremental Tvpe-2 Indices,> 

A second general form of the incremental fit indices described by Marsh, 
Balla, and McDonald (1988) is: Absolute Value (t - n) / Absolute Value of (e 
- n), where t is the value of a stand-alohe index for the target model, n is 
the value for the null model, and e is the expected value of the stand-alone 
index if the target model is true. This second form of incremental index 
requires the expected value for a true model in addition to empirical values 
for the target and null models. In general, expected values for the stand- 
alone indices are not known for finite samples but can be estimated based on 
the asymp*:.atic behavior of the indices. For example, many of the stand alone 
indices can be specified in terms of x2 and the asymptotic expected value for 
the x2 equals the df for the m&Jel. For purposes of the present 
investigation, incremental type-2 indices were derived from only the X^/df 
and Dk stand-alone indices. These ar& denoted by appenuing an 12 to each of 
the stand-alone indices though the X^/df 12 is better known as the Tucker 
Lewis Index (Tucker & Lewis, 1973> and McDonald and Marsh (1988) refer to the 
DKI2 as the KcBonald-Marsh Index (HMD. These are defined as: 

(22) x2/dfI2 = TLI = (Xn^/dfj^ - ^'^/di^•) / (X^^m^ - Cl.03). 

(23) DkI2 = MMI = (Dk^ - Dkfc) / (Dkn - 0) 

[Note that because the expected value of Dk for a true model is 0, the 
incremental type-1 and type-2 forms of this index are the saiue-l 
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FIGURE CAPTIONS 

FIGURE 1. Values for selected goodness-of --f it indices based on two 
population models (simple and complex >, 8 models, and 6 sample sizes (SO, 
100, 200, 400, 800, and 1600 corresponding to the 6 column bars above each 
model respectively). 
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Table 1 

Description gf Models To Be Tested. 



Abb rev** 
iation 


Parameters 


Description 


0 


9 


Null 


a 

lUF 


18 


1 Unrestricted Factor 


2UF 


26 


2 Unrestricted Factors 


3UF 

b 


33 


3 Unrestricted Factors 


3F1 

h 


24 


3 Factors; 9 najor, 3 minor factor loadings 


Q 

3F2 


24 


3 Factors! ' raajor, 3 ftiinor factor loadings 


c 

3F3 

d 


24 


3 Factors; 9 major, 3 minor factor loadings 


3CF 


27 


3 Complex Factors; 9 major, 6 minor factor loadings 


. e 
3SF 


21 


3 Simple Factors; 9 major factor loadings 



Note. The nine models were designed to fit 9x9 covariance matrices 
generated from one of two population models. The simple simulated (SSIM) 
data was generated by the 3SF model in which 3 correlated factors ware each 
defined by a unique set of three variables. Thus the most parsimonious 
model able to fit this data contained only 9 major factor loadings. The 
complex simulated (CSIM) data was generated from the 3CF model that 
contained three complex factors. In the 3CF model each factor was defined 
by three major factor loadings, the same as those in the 3SF model, and two 
additional minor loadings. Thus, the most parsimonious model able to fit 
this data contained 9 major factor loadings and 6 minor factor loadings, 
a — Unrestricted factor models for 1, 2, and 3 factors, b — For the SSIH 
data these models contained 3 superfluous parameters, factor loadings that 
had population values of zero. For the CSIH data 3 of the 6 minor factor 
loadings were constrained to be zero, c — For both the SSIH and CSIM data 
this fnodel contained 3 superfluous parameters. For the CSIM data all 6 minor 
factor loadings were constrained to be zero, d — This model was used to 
generate the CSIM data. For the SSIM data it contained 6 superfluous 
parameters, e — This model was used to generate the SSIM data. For the CSIM 
data all 6 minor (non-zero) loadings were constrained to be zero. 
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Tiblt 2 

I b 
Hodfls Predictions Fort 





lUF 2ljF 3UF 3FI 3F2 3F3 3CF 3SF 


CSIR Data 


SSIH Data 


Contrast i 


-1 


♦I 


0 


0 


0 


0 


0 


0 


2UF > lUF 


2tiF > lUF 


Contrast 2 


0 


-I 


♦1 


0 


0 


0 


0 


0 


3UF > 2UF 


3UF > 2UF 


Contrast 3 


0 


0 


♦1 


0 


0 


0 


-I 


0 


3UF = 3CF 


3UF = 3CF 


Contrast 4 


0 


0 


♦1 


0 


0 


0 


0 


-1 


3UF > 3SF 


3UF = 3SF 


Contrast 5 


0 


0 


0 


♦1 


♦1 


-2 


0 


0 


3F1,3F2 > 3F3 


3F1,3F2 = 3F3 


Contrast 6 


0 


0 


0 


-1 


-1 


0 


♦2 


0 


> 3F1,3F2 


3FC = 3Fl,3F2 


Contrast 7 


0 


0 


0 


♦1 


♦I 


0 


0 


-2 


3F1,3F2 > 3SF 


3F1,3F2 = 3SF 


Contrast 8 


0 


0 


0 


0 


0 


♦1 


0 


-1 


3F3 = 3S? 


3F3 « 3SF 


Contrast 9 


0 


0 


0 


0 


0 


0 


♦1 


-1 


3CF > 3SF 


3Cf = 3SF 



liots^SSIN ' siiple sieuUted data that Mas gefierated by the 3^ Mdeh CSIH 
- cotplex siiulated data that nas generated by the 3CF Mxiel. 
a — See Table 2 for a description of the iodels. b — For predictions 
represented by > signs, lodels on the left side of the > signs should be 
better able to fit the data. For all but one prediction repr«ented by 
• signs, todels on the left side of the - signs have ecre paraKters and 
thus provide a test of penalty functions iiposed by sose indices. For just 
contrast 5 for the SSIH data, sodels on both sides of the ^ sign have the 
sate nuiber of paraieters. 



Tablt 3 

The Elltct of Bodtl and SaMlt Size on Fit Indicw lor BSIH Datai Effect Sizes 
TEtas ana rii anO RTori contrasts 



a 




b 






Inter- 


c 










Index 


Kodel 


Size 






action 


A Priori Conti iists 






















d d d 


d 




d 


d 




Eti 


Eta 


rl 


r2 


Eta 


1 2 3 4 5 


6 


7 


8 


9 


Stand-alone indices 


















1 FF 


.82tt 


.47tt 


-.43 


-.31 


.14tt 


Ht m H *tt * 


♦ 




♦ 


+11 


2 m 


.81» 


.SOU 


.45 


.33 


.14tt 


m m *n *tt * 


H 


♦$ 


♦ 


+11 


3 12 


.65tt 


.3itt 


.U 


.33 


.33tt 




♦ 


♦ 


♦ 


+ 


4 WH 


.64tt 


.38tt 


.38 


.34 


.66» 


m *u* * * 


♦ 


♦ 


- 


♦ 


5 RMt 


.84ft 


.41tt 


-.31 


-.39 


.21tt 


m *tt *tt Ht * 


♦tt ♦) 


♦t 


+11 


6 6FI 


.Bltt 


.30tt 


.45 


.33 


.16tt 


*tt *tt *tt *tt * 


H 


H 


♦ 


+11 


7 A6FI 


J2tt 


.59tt 


.54 


.40 


• nil 


♦II til t t t 




♦ 


♦ 


X 

t 


8 CMC 


.40tt 


.90tt 


-.81 


-.60 


.061 


♦It Ht -1 -II * 










9 CSX 


.23tt 


.96tt 


-.90 


- LO 
-.00 


.V7M 


Att Att .ft .## A 
♦II ♦!! -II -II ♦ 


-tt 


-tt 


-ii 


II 


10 OCAK 


.&4tt 


.36tt 


.36 


,33 


.6711 


♦It Ht - -St ♦ 










11 OCSX 


.58tt 


.SOU 


.48 


.4? 


.6411 


♦tt Ht -at -tt ♦ 


-It 


-tt 


-tt 


-It 


12 CN 


.35tt 


.67tt 


.67 


.60 


.3911 


♦ ♦It ♦ ♦ ♦ 




♦ 


♦ 


♦ 


13 DK 


.93tt 


.02 


.01 


.01 


.10 


♦tt ♦It ♦ * ♦ 


♦ 




♦ 


♦ 


14 HC 


.93» 


.02 


.01 


-.01 


.10 


til +11 + ♦ + 


♦ 


♦ 


♦ 


♦ 


15 2 


.78tt 


.29tt 


.28 


.28 


.52tt 


+11 +11 + + + 




♦ 


♦ 


♦ 


Type-1 incretental indices 
















1& FFIl 


.91tt 


.2&tt 


.25 


.20 


.1811 


^•ft X% X%% X 

♦II ♦ll ♦I HI ♦ 


♦ 


♦ 




♦II 


17 LHRIl 


.71tt 


.48tt 


.40 


.28 


.10 


♦tt ♦tt ♦ ♦{ ^ 


♦ 


♦ 


♦ 


♦ 


IB X2I1 


.91tt 


.26tt 


.25 


.17 


.loll 


Aftft Aftft Aft Aftft X 
+11 +11 +1 Til + 


♦ 


♦ 


X 

T 


♦II 


19 I2/dHl .87tt 


.34tt 


.32 


.26 


.21)1 


+11 +lt - + + 




♦ 




♦ 


Parsiiony Indices 




















20 PII2 


.91tt 


.2Stt 


.25 


.19 


Ant 


+11 -« -II -$l + 


-tt 


-tt 


-tt 


-It 


21 PIRHR 


.92tt 


.281$ 


.26 


.20 


.14tt 


-II -II -II -It + 


-tt 


-tt 


-It 


-It 


Type-2 increKntal indices 
















22 TLI 


.92tt 


.07tt 


-.05 


-.04 


.14tt 


+11 +11 + + + 




♦ 


♦ 


♦ 


23 HHI- 


.9Stt 


.05 


-.03 


-.02 


.12tt 


+11 +11 + + + 




♦ 


♦ 


♦ 



fate.. Results are based on a series of 8 (Kodels) by 6 (Sasple Sizes) ftNOVAs 
conducted on the sisple sieulated data set. The TLI and M indices are based 
on the I^/df and the Dk indices respectively, 
t p < .05; tt p < .01. 



a — see Table 1 for a description of the indices, b — Eta is the linear and 
nonlinear effects of saiple size (N) i rl is the linear effect of the log 
saeple size (saiple sizes are log spaced in this study)} and r2 is the linear 
effect of saaple size, c — For each of the a priori contrasts a ♦ sign 
indicates that the best fit nas obtained for the eodel posited to fit the 
best I or for the eodel nith the greatest nuiber of paraieters nhen the 
contrasted lodels vers posited to fit equally ttell (see Table 3 for a 
description of the a priori contrasts), d — These contrasts are between' lodels 
that are equally able to fit the i!ata. 
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Table 4 

Effect Qi Sample Size and Model on Fit Indices for S3IM Data For Hodels 3:;:8 

a b c Inter- 
Index Model Size action 

Eta Complexity Eta rl r2 Eta 

Stand-alone indices 



1 FF 


.18*» 


+ 


.89t» 


-.79 


-.53 


.20»» 


2 LHR 


.18»» 


+ 


.90«« 


• d1 


.oU 




3 X2 


.52<« 


+ 


.07 


-.02 


-.01 


.07 


4 X2/df 


.05 


+ 


.08 


-.02 


-.01 


.10 


5 RMR 


.29»« 


+ 


.89$« 


-.86 


-.70 


. Ibtt 

.2in 


6 6FI 


.2U» 


+ 


.89»» 


.81 


.60 


7 ASF I 


.05 


+ 


.9Ztt 


.84 


.63 




8 CAK 


.05»» 




.99U 


-.66 


-.89 


.04* 


9 CSK 


.ntt 

.49** 




.99«« 


— » 


— . /u 




10 OCAK 




.07 


-.02 


-.01 


.08 


11 OCSK 


.49«» 




.84»» 


.84 


.77 




12 CN 


.06 


+ 


.82»t 


.74 


.82 


. 0/ 


13 DK 


.07 


+ 


.12 


-.07 


-.04 


.14 


14 MC 


.07 


+ 


.10 


.Ob 


.03 


. 14 


15 Z 


.05 


+ 


.09 


-.03 


-.02 


.09 


Type-1 incremental 


indices 








16 FFIl 


.21«« 


+ 


.84>« 


.80 


.63 


.19«« 


17 LHRIi 


.09 


+ 


.66»» 


-.56 


-.39 


.09 


18 X2I1 


.21«« 


+ 


.84» 


.80 


.63 


.19»» 


19 X2/dfIl 


.03 


+ 


.86»» 


.83 


.65 


.05 


Parsifflany Indices 












20 PIX2 


.89«> 




.38>» 


.37 


.29 


.14»» 


21 PIRMR 


.95« 




.29«» 


.28 


.21 


.10»t 


Type-2 incremental 


indices 








22 TLI 


.06 


+ 


.06 


.03 


.01 


.03 


23 HMI 


.06 


+ 


.02 


.01 


.00 


.06 



Note. . Results are^based on a series of 6 (Models) by 6 (Sample Sizes) 
ANOVAs conducted on the simple simulated data set. For purposes of these 
analyses only the three-factor models, all of which are' able to fit the 
data, were included. 
« p < .05; » p < .01. 

a — see Table 1 for a description of the indices, b — Because all models 
an? equally able to fit the data, differences between models are a test of 
the relations between each model and model complexity. Under the Complexity 
column a + indicates that fit improved with the addition of superfluous 
parameters and a - indicates that fit was poorer with the addition of 
superfluous parameters, c — Eta is the linear and nonlinear effects of 
sample size (N) , rl is the linear effnct of the log sample size (sample 
sizes are log spaced in this study), and r2 is the linear effect of sample 
size. 
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a b Inter- c 

Index Kodel Size action A Priori Contrasts 















Eta Eta 


rl 


r2 


Eta 


Stand-aloni indices 








I FF 


.73» .S7tt 


-.51 


-.36 


.16tt 


2 LHR 


.74tt .sett 


.52 


.38 


.14tt 


3 12 


.Mtt .49tt 


.44 


.49 


.62tt 


4 I2/dl 


.S7tt .S3tt 


.47 


.53 


.6Ut 


5 RMR 


.80tt .46tt 


-.43 


-.32 


.19tt 


6 6FI 


.74tt .S7tt 


.50 


.36 


.16tt 


7 «FI 


.Mt .68tt 


.61 


.44 


.1511 


8 CAK 


.30tt .94tt 


-.84 


-.62 


.05 


9 CSK 


AM .98tt 


-.91 


-.69 


.09St 


10 OCAK 


.57tt .SOU 


.45 


.50 


.63tt 


11 OCSX 


.49tt .64tt 


.60 


.63 


.sett 


12 CJJ 


.S3tt .48tt 


.46 


.48 


11** 

.6111 


13 DK 




-.02 


-.01 


.11 


14 nc 


.88tt .W 


.02 


AA 
.00 


.12 


IS Z 


.bm .» 


.54 


.55 


.45tt 


Type-1 increientil indices 






16 FFIl 


.87tt .37tt 


.35 


.27 


.ISIf 


17 LKRIl 


.60tt .Sltt 


.35 


.22 


.09 


18 I2I1 


.B7tt .37tt 


.35 


.27 


.IStt 


19 I2/dfIl .79tt .47tt 


.45 


.35 


.18tt 


Parsitony indices 








20 PII2 


.87tt .34tt 


.33 


.25 


.16tt 


21 PIRHR 


.91tt .32tt 


.28 


.20 


.14» 


Type-2 increientil indices 






22 TLI 


.8Stt .07 


-.04 


-.03 


.12 


23 mi 


.90tt .04 


-.03 


-.02 


.10 



m m H m * m m m m 
m m Ht m * m *n *it m 
m m t m m m m m m 
m m < m m m m * m 
m m m m * m m m m 
m m m m * m m m m 
m m ♦ ♦» ♦ ♦» H ♦ ♦» 
m +<t - *t * ♦« ♦ ♦ ♦» 
ttt ♦» -tt -It ♦ - - -t -t 
♦$$ ♦« - -« m ♦« m ♦ ♦» 
m ♦» -tt - ttt ttt ♦ - -t? 
♦ ttt ♦ ttt ♦ ttt ♦ ♦ ttt 
ttt *tt * Ht * m m * m 

m Ht t ttt m Ht m m m 



m m m ttt * *ti m m m 

ttt ttt ♦ ttt ♦ tt ♦ ♦ ttt 

ttt ttt ttt ttt * m m m m 

ttt t ttt * *tt m * m 



- -tt -tt -tt ♦ -tt -tt -tt -tt 
-tt -tt -tt -tt - -tt -tt -tt -tt 



ttt ttt ♦ ^ts ♦ ttt tt ♦ ttt 
ttt *tt ♦ ttt ♦ ttt tt ♦ *tt 



Mote.. Results are based on a series of 8 (ttodels) by 6 (Saaple Sizes) AHOVAs 
conducted on the coiplicated situlated data set. 

a — see Table 1 lor a description of the indices, b ~ Eta is the linear and 
nonlinear effects of saiple size (N) , rl is the linear effect of the log 
saeple size (saiple sizes are log spaced in this study), and r2 is the linear 
effect of saiple size, c ~ For each of the a priori contrasts a ♦ sign 
indicates that the best fit Mas obtained for the lodel posited to fit the 
best, or for the sodel lith the greatest nuiber of paraieters ihen the 
contrasted lodels lere posited to fit equally wll {see Table 3 for a 
description of the a priori contrasts), d ~ These contrasts are betmen Mdels 
that are equally able to fit the data. 

t p < .05; tt p < .01. 
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